Back

Journal of Speech, Language, and Hearing Research

American Speech Language Hearing Association

Preprints posted in the last 30 days, ranked by how well they match Journal of Speech, Language, and Hearing Research's content profile, based on 10 papers previously published here. The average preprint has a 0.00% match score for this journal, so anything above that is already an above-average fit.

1
Phonemic awareness deficits in an alphasyllabary language: Effects of task type and linguistic complexity in children with Specific Learning Disorder-Reading

Soman, A.; Dev, S. S.; Ravindren, R.

2026-04-07 psychiatry and clinical psychology 10.64898/2026.04.02.26349894 medRxiv
Top 0.1%
12.4%
Show abstract

Background Phonemic awareness deficits are a core feature of Specific Learning Disorder-Reading (SLD-R). How task- and language-specific factors influence these deficits in alphasyllabary languages may help clarify the cognitive mechanisms underlying reading impairment in SLD-R. Methods Thirty children with a DSM-5 diagnosis of SLD-R (mean age 11.4 years) and 29 age-matched typically developing children were given phoneme blending (words and pseudowords) and segmentation tasks in Malayalam. The effects of age and consonant clusters on task performance were evaluated. Results Children with SLD-R performed significantly worse than controls across most phonemic awareness tasks, with the largest deficits observed in pseudoword blending and word blending, and smaller deficits in segmentation. No significant difference was observed for initial phoneme deletion. In typically developing children, age showed strong positive correlations with phonemic performance across most tasks, whereas the SLD-R group showed weak or absent correlations, except in word blending and initial phoneme deletion. Consonant clusters significantly affected performance in both groups, with SLD-R showing more severe deficits. Conclusions Phonemic awareness deficits observed in SLD-R in alphasyllabary languages like Malayalam are more prominent in tasks where lexical support is absent, like pseudoword blending. These deficits vary across task types and linguistic complexity. Phonemic awareness improves with age in typically developing children, while improvement is uneven in children with SLD-R. The findings suggest that phonemic awareness deficits are a core feature of SLD-R across languages, but their manifestation is shaped by orthographic and linguistic characteristics of the writing system.

2
Speech-Based Markers in Paediatric ADHD: A Longitudinal Case-Control Study of Voice Features and Medication Effects

Bamberger, R.; Kuhles, G.; Lotter, L. D.; Dukart, J.; Konrad, K.; Guenther, T.; Siniatchkin, M.; Fuchs, M.; von Polier, G.

2026-03-31 psychiatry and clinical psychology 10.64898/2026.03.25.26348708 medRxiv
Top 0.1%
4.3%
Show abstract

Background Diagnosis and treatment monitoring of attention-deficit/hyperactivity disorder (ADHD) largely rely on subjective assessments, highlighting the need for objective markers. Voice features and speech embeddings represent promising candidates for such markers, as they may capture alterations in speech production relevant to ADHD. However, it remains unclear which speech features are most informative for distinguishing ADHD and monitoring treatment effects, and which speech tasks most reliably elicit such differences. Methods Twenty-seven children with ADHD and 27 age-matched neurotypical controls completed six speech tasks across two study visits. Children with ADHD were unmedicated at baseline (first visit) and were assessed under prescribed methylphenidate treatment at follow-up, whereas controls underwent repeated assessment without intervention. Established acoustic voice features (eGeMAPS) and high-dimensional speech embeddings (WavLm, Whisper) were extracted and analysed using linear mixed models to examine baseline group differences and group-by-time interaction effects reflecting medication-associated change patterns. Results At baseline, children with ADHD differed significantly from controls in frequency, spectral, and temporal voice features, characterized by lower and more variable pitch, altered spectral properties, and reduced rhythmic stability. Group-by-time interaction effects indicated medication-associated modulation in the ADHD group, including reduced loudness variability and increased precision of vowel articulation at follow-up, changes not observed in controls. Speech embeddings revealed additional baseline and interaction effects beyond established acoustic features. Free speech tasks, particularly picture description, yielded the most robust and consistent effects. Conclusion Children with ADHD differed from neurotypical controls in vocal features at baseline and showed distinct longitudinal change patterns consistent with medication-related change. These findings support further investigation of speech-based measures as candidate digital phenotypes and potential digital biomarkers in ADHD, with picture description emerging as a particularly promising task for future clinical assessment protocols.

3
Training-Free Cross-Lingual Dysarthria Severity Assessment via Phonological Subspace Analysis in Self-Supervised Speech Representations

Muller, B.; Ortiz Barranon, A. A.; Roberts, L.

2026-04-17 neurology 10.64898/2026.04.12.26350731 medRxiv
Top 0.1%
1.7%
Show abstract

Dysarthric speech severity assessment typically requires either trained clinicians or supervised machine learning models built from labelled pathological speech data, limiting scalability across languages and clinical settings. We present a training-free method (no supervised severity model is trained; feature directions are estimated from healthy control speech using a pretrained forced aligner) that quantifies dysarthria severity by measuring the degradation of phonological feature subspaces within frozen HuBERT representations. For each speaker, we extract phone-level embeddings via Montreal Forced Aligner, compute d scores along phonological contrast directions (nasality, voicing, stridency, sonorance, manner, and four vowel features) derived exclusively from healthy control speech, and construct a 12-dimensional phonological profile. Evaluating 890 speakers across10corpora, 5 languages for the full MFA pipeline (English, Spanish, Dutch, Mandarin, French) and 3 primary aetiologies (Parkinsons disease, cerebral palsy, amyotrophic lateral sclerosis), we find that all five consonant d features correlate significantly with clinical severity (random-effects meta-analysis rho = -0.50 to -0.56, p < 2 x 10^-4; pooled Spearman rho = -0.47 to -0.55 with bootstrap 95% CIs not crossing zero), with the effect replicating within individual corpora, surviving FDR correction, and remaining robust to leave-one-corpus-out removal and alignment quality controls. Nasality d decreases monotonically from control to severe in 6 of 7 severity-graded corpora. Mann-Whitney U tests confirm that all 12 features distinguish controls from severely dysarthric speakers (p < 0.001).The method requires no dysarthric training data and applies to any language with an existing MFA acoustic model (currently 29 languages) or a model trained from healthy speech alone. It produces clinically interpretable per-feature profiles. We release the full pipeline and phone feature configurations for six languages to support replication and clinical adoption. Author SummaryOne of the authors has lived with ALS for sixteen years. Bernard Muller, who built this entire analytical pipeline using only eye-tracking technology, has experienced the progression of the disease firsthand, including the dysarthric speech that comes with advancing ALS and the tracheostomy that followed. The problem this paper addresses is not abstract to him, and that shapes how the method was designed. We developed a method to measure how well a person with dysarthria can produce distinct speech sounds, without needing any recordings of disordered speech for training. Our approach works by analysing how a widely available AI speech model organises different sound categories -- such as nasal versus oral consonants, or voiced versus voiceless sounds -- and measuring whether those categories become harder to tell apart. We tested this on 890 speakers across 10 datasets in five languages, covering Parkinsons disease, cerebral palsy, and ALS. Because the method only needs healthy speech recordings to set up, it applies to any language with an existing acoustic model, currently covering 29 languages. The resulting profiles show clinicians which specific aspects of speech production are degrading, rather than providing a single opaque severity score. This could support remote monitoring of speech decline in neurodegenerative disease and enable screening in languages and settings where specialist assessment is unavailable.

4
Transformer Language Models Reveal Distinct Patterns in Aphasia Subtypes and Recovery Trajectories

Ahamdi, S. S.; Fridriksson, J.; Den Ouden, D.

2026-03-27 neuroscience 10.64898/2026.03.27.714240 medRxiv
Top 0.1%
1.7%
Show abstract

Language impairments in aphasia are characterized by various representational disruptions that may be reflected in discourse production. This research examines the capacity of transformer-based language models, particularly GPT-2, to serve as a computational framework for analyzing variations in aphasic narrative speech. A longitudinal dataset of narrative speech samples collected at six time points from individuals with aphasia (N = 47) was utilized as part of an intervention study. All transcripts were processed via the GPT-2 language model to obtain activation values from each of the 12 transformer layers. Statistically significant differences in activation magnitude across aphasia subtypes were found at every layer (all p < .001), with the most pronounced effects in the deeper layers. Pairwise Tukey HSD tests revealed consistent distinctions between Brocas aphasia and both Anomic and Wernickes aphasia, suggesting a shared activation profile between the latter two. Longitudinal tests revealed significant changes over time, especially in the final three layers (10-12). These findings suggest that transformer-based activation patterns reflect meaningful variation in aphasic discourse and could complement current diagnostic tools. Overall, GPT-2 provides a scalable tool to model representational dynamics in aphasia and enhance the clinical interpretability of deep language models.

5
Neural subtypes in developmental stuttering

Nanda, S.; Gervino, G.; Pang, C. Y.; Garnett, E. O.; Usler, E.; Chugani, D. C.; Chang, S.-E.; Chow, H. M.

2026-03-26 neuroscience 10.64898/2026.03.25.714210 medRxiv
Top 0.1%
1.7%
Show abstract

Developmental stuttering is a complex neurodevelopmental disorder characterized by disfluent speech. At the individual level, the behavioral manifestations of stuttering vary considerably, likely reflecting heterogeneity in underlying neural mechanisms. In this study, we examined individual-specific differences in the brains of children who stutter (CWS), by implementing normative modeling, a framework that quantifies how an individual deviates from an age- and sex-matched reference population. We applied this approach to identify individual-specific structural brain atypicalities using gray and white matter volumes. These volumes were derived from MRI scans from a large mixed-longitudinal dataset of 235 and 240 scans from CWS and fluent controls respectively, aged between 3 and 12 years. Individual deviation maps capturing these atypicalities were then used to cluster CWS into subtypes based on similarities in their neuroanatomical profiles. This analysis identified four neural subtypes with distinct neuroanatomical atypicalities relative to fluent controls. The key findings were a basal ganglia-thalamo-cerebellar subtype associated with higher stuttering severity and lower rates of recovery, and a white matter subtype characterized by mild severity and a higher likelihood of recovery. The remaining two subtypes showed cerebellar differences alongside alterations in brain regions involved in sensorimotor integration. Moreover, cerebellar volume atypicalities were present in all four subtypes, indicating that cerebellar alterations were present across otherwise distinct neural profiles and may represent a shared neuroanatomical feature of stuttering. These findings indicate that examining individual-specific neural differences and subtyping based on patterns of neural atypicalities provides valuable insight into the heterogeneity of developmental stuttering and represents a promising direction for improving our understanding of the disorder.

6
Naming Performance in Bilinguals with Alzheimer's Disease and Mild Cognitive Impairment

Sainz-Pardo, M.; Hernandez, M.; Suades, A.; Juncadella, M.; Ortiz-Gil, J.; Ugas, L.; Sala, I.; Lleo, A.; Calabria, M.

2026-03-25 neurology 10.64898/2026.03.23.26349075 medRxiv
Top 0.1%
0.9%
Show abstract

Introduction. There is consistent evidence of a disadvantage in bilinguals' speech production compared to monolinguals in healthy individuals, but studies investigating this phenomenon in clinical populations such as Mild Cognitive Impairment (MCI) and Alzheimer's Disease (AD) are scarce. Given that both clinical groups are characterized by wordfinding difficulties, understanding how bilingualism influences speech production in these populations is essential. Methods. Early and highly proficient Catalan-Spanish bilinguals (active bilinguals) were compared to Spanish-dominant speakers with low proficiency in Catalan (passive bilinguals) using a picture-naming task. The study included 58 older adults, 66 patients with AD, and 124 individuals with MCI. Reaction times, accuracy, and error types were collected in the naming task in each individual's dominant language. Results. First, active bilinguals demonstrated faster naming latencies than passive bilinguals, particularly for low-frequency words. Second, active bilinguals with MCI exhibited more naming errors than passive bilinguals with MCI, including a higher incidence of crosslanguage intrusions and anomia. Third, passive bilinguals with MCI and AD showed more semantic errors than active bilinguals. Discussion. These findings underscore the impact of second language use on naming performance in MCI and AD. Moreover, they provide insight into the potential mechanisms underlying lexical retrieval differences in bilinguals, including lexico-semantic processing and language control.

7
Speech-in-Noise Difficulties in Aminoglycoside Ototoxicity Reflects Combined Afferent and Efferent Dysfunction

Motlagh Zadeh, L.; Izhiman, D.; Blankenship, C. M.; Moore, D. R.; Martin, D. K.; Garinis, A.; Feeney, P.; Hunter, L. R.

2026-03-26 otolaryngology 10.64898/2026.03.23.26348719 medRxiv
Top 0.1%
0.7%
Show abstract

Objectives: Patients with Cystic fibrosis (CF) often receive aminoglycosides (AGs) to manage recurrent pulmonary infections, placing them at risk for ototoxicity. Chronic AG use can lead to complex cochlear damage affecting inner and outer hair cells, the stria vascularis, and spiral ganglion neurons. The greatest damage is typically in the basal cochlear region, which encodes high-frequency hearing, with additional involvement of more apical regions. While extended-high-frequency (EHF) hearing loss (EHFHL; 9-16 kHz) is often the earliest sign of AG ototoxicity, speech in noise (SiN) effects are rarely studied. Our overall hypothesis is that SiN perception difficulties in individuals with CF, treated with AGs, are related to combined cochlear and neural damage, primarily in the EHF range but also in the standard frequency (SF; 0.25-8 kHz) range. Three mechanisms that contribute to SiN perception were evaluated in children and young adults: 1) a primary effect of reduced EHF sensitivity, measured by pure-tone audiometry (PTA) and transient-evoked otoacoustic emissions (TEOAEs); 2) a secondary effect of subclinical damage in the SF range, measured by PTA and TEOAEs; and 3) additional neural effects, measured by middle ear muscle reflex (MEMR) threshold (afferent) and growth functions (efferent).Design:A total of 185 participants were enrolled; 101 individuals with CF treated with intravenous AGs and 84 age and sex-matched Controls without hearing concerns or CF. Assessments included EHF and SF PTA; the Bamford-Kowal-Bench (BKB)-SIN test for SiN perception; double-evoked TEOAEs with chirp stimuli from 0.71 to 14.7 kHz; and ipsilateral and contralateral wideband MEMR thresholds and growth functions using broadband stimuli. Results: Reduced sensitivity at EHFs (PTA, TEOAEs) was not associated with impaired SiN perception in the CF group. SF hearing, regardless of EHF status, was the primary predictor of SiN performance in the CF group. Increased MEMR growth was also significantly associated with poorer SiN in the CF group. Conclusions: In CF, impaired SiN perception was primarily predicted by SF hearing impairment, with additional involvement of the efferent auditory pathway through increased MEMR growth. These results build on prior evidence for efferent neural effects due to ototoxic exposures, supporting both sensory (afferent) and neural (efferent) mechanisms that contribute to listening difficulties in CF. Thus, preventive and intervention strategies should consider these combined mechanisms in people with AG ototoxicity to address their SiN problems.

8
Can Multimodal Large Language Models Visually Interpret Auditory Brainstem Responses?

Jedrzejczak, W.; Kochanek, K.; Skarzynski, H.

2026-04-17 otolaryngology 10.64898/2026.04.15.26350944 medRxiv
Top 0.1%
0.5%
Show abstract

Introduction: Auditory brainstem response (ABR) is a standard objective method for estimating hearing threshold, especially in patients who cannot reliably participate in behavioral audiometry. However, ABR interpretation is usually performed by an expert. This study evaluated whether two general-purpose artificial intelligence (AI) multimodal large language model (LLM) chatbots, ChatGPT and Qwen, can accurately estimate ABR hearing thresholds from ABR waveform images. The accuracy was measured by comparisons with the judgements of 3 expert audiologists. Methods: A total of 500 images each containing several ABR waveforms recorded at different stimulus intensities were analyzed. Three expert audiologists established the reference auditory thresholds based on visual identification of wave V at the lowest stimulus intensity, with the most frequent judgment among the three used as the reference. Each waveform image was independently submitted to ChatGPT (version 5.1) and Qwen (version 3Max) using the same standardized prompt and without additional clinical context. Agreement with the expert thresholds was assessed as mean errors and correlations. Sensitivity and specificity for detecting hearing loss (>20 dB nHL) were also calculated. In cases where the AI and expert thresholds nominally matched, corresponding latency measures were also compared. Results: Auditory thresholds derived from both LLMs correlated strongly with expert opinion, with Pearson r = 0.954 for ChatGPT and r = 0.958 for Qwen. ChatGPT showed a mean error of +5.5 dB and Qwen showed a mean error of -2.7 dB. Exact nominal agreement with expert values was achieved in 34.6% of ChatGPT estimates and 35.6% of Qwen estimates; agreement within +/-10 dB was observed in 75.6% and 80.0% of cases, respectively. For hearing-loss classification, ChatGPT achieved 100% sensitivity but low specificity (20.4%), whereas Qwen showed a more balanced profile with 91.6% sensitivity and 67.5% specificity. Curiously, estimates of wave V latency were markedly poor for both LLMs, with systematic underestimation and weak correlations with the expert judgements. Conclusion: ChatGPT and Qwen demonstrated a moderate ability to estimate ABR thresholds from waveform images, although their performance was not good enough for independent clinical use. Both models captured general patterns of hearing loss severity, but there was systematic bias, limited specificity and sensitivity balance, and poor latency estimation. General-purpose multimodal LLMs may have potential as assistive or preliminary tools, but clinically reliable ABR interpretation will likely require specialized, domain-trained AI systems with expert oversight.

9
Perceived vs. actual navigation ability: Differences between autistic and typically developing children

McKeown, D. J.; Cruzado, O. S.; Colombo, G.; Angus, D. J.; Schinazi, V. R.

2026-04-13 psychiatry and clinical psychology 10.64898/2026.04.09.26350542 medRxiv
Top 0.1%
0.5%
Show abstract

PurposeNavigational ability develops throughout childhood alongside the maturation of brain regions supporting egocentric and allocentric processing. In Autism Spectrum Disorder (ASD), atypical hippocampal development may impact flexible spatial memory; however, findings on navigational ability in autistic children remain inconsistent. This study aimed to compare both objective and perceived navigation ability in children with ASD and typically developing (TD) peers. MethodTwenty-six children with high-functioning ASD and twenty-five age- and gender-matched TD children (M_age = 12.04 years, SD = 1.64) completed a battery of navigational tasks from the Spatial Performance Assessment for Cognitive Evaluation (SPACE), including Path Integration, Egocentric Pointing, Mapping, Associative Memory, and Perspective Taking. Perceived navigation ability was assessed using the Santa Barbara Sense of Direction (SBSOD) scale. ResultsNo significant group differences were observed across any objective navigation tasks. However, children with ASD reported significantly lower perceived navigation ability compared to TD peers. ConclusionThese findings suggest a dissociation between perceived and actual navigational ability in ASD. By early adolescence, objective navigation performance appears intact, potentially reflecting sufficient maturation of underlying neural systems or the presence of compensatory mechanisms. The results underscore the importance of incorporating objective, task-based measures when assessing cognitive abilities in autistic populations.

10
Hearing sounds when the eyes move: A case study implicating the tensor tympani in eye movement-related peripheral auditory activity

King, C. D.; Zhu, T.; Groh, J. M.

2026-03-25 neuroscience 10.64898/2026.03.24.713974 medRxiv
Top 0.1%
0.5%
Show abstract

Information about eye movements is necessary for linking auditory and visual information across space. Recent work has suggested that such signals are incorporated into processing at the level of the ear itself (Gruters, Murphy et al. 2018). Here we report confirmation that the eye movement signals that reach the ear can produce perceptual consequences, via a case report of an unusual participant with tensor tympani myoclonus who hears sounds when she moves her eyes. The sounds she hears could be recorded with a microphone in the ear in which she hears them (left), and occurred for large leftward eye movements to extreme orbital positions of the eyes. The sounds elicited by this participants eye movements were reminiscent of eye movement-related eardrum oscillations (EMREOs, (Gruters, Murphy et al. 2018, Brohl and Kayser 2023, King, Lovich et al. 2023, Lovich, King et al. 2023, Lovich, King et al. 2023, Abbasi, King et al. 2025, Sotero Silva, Kayser et al. 2025, King and Groh 2026, Leon, Ramos et al. 2026, Sotero Silva, Brohl et al. 2026)), but were larger and longer lasting than classical EMREOs, helping to explain why they were audible to her. Overall, the observations from this patient help establish that (a) eye movement-related signals specifically reach the tensor tympani muscle and that (b) when there is an abnormality involving that muscle, such signals can lead to actual audible percepts. Given that the tensor tympani contributes to the regulation of sound transmission in the middle ear, these findings support that eye movement signals reaching the ear have functional consequences for auditory perception. The findings also expand the types of medical conditions that produce gaze-evoked tinnitus, to date most commonly observed in connection with acoustic neuromas.

11
Neural correlates of novel word-form learning in developmental language disorder

Bahar, N.; Cler, G. J.; Asaridou, S. S.; Smith, H. J.; Willis, H. E.; Healy, M. P.; Chughtai, S.; Haile, M.; Krishnan, S.; Watkins, K. E.

2026-03-31 neuroscience 10.64898/2026.03.28.715039 medRxiv
Top 0.1%
0.4%
Show abstract

Children with developmental language disorder (DLD) have persistent language learning difficulties and often perform poorly on pseudoword repetition, a task that probes phonological, memory, and speech-motor processes that support vocabulary acquisition. Research on the neural basis of pseudoword repetition in DLD is limited. We used whole-brain functional MRI (fMRI) to examine pseudoword repetition and repetition-based learning in 46 children with DLD (ages 10-15 years) and 71 age-matched children with typical language development. During scanning, children heard and repeated pseudowords paired with visual referents, allowing us to track learning-related changes in neural activity across repetitions. Repeated pseudoword production yielded comparable behavioural learning across groups, with faster productions by later repetitions. Post-scan, form-referent recognition was comparable across groups, whereas pseudoword repetition accuracy was lower in DLD. Pseudoword repetition engaged a distributed neural network, including inferior frontal cortex bilaterally (greater on the left), premotor and sensorimotor cortex, and posterior temporal and occipital regions. Group differences emerged primarily in regions where activity was task negative (i.e., below baseline or deactivated): lateral occipito-parietal cortex (posterior angular gyrus), medial parieto-occipital cortex (retrosplenial), and right posterior cingulate cortex. Learning-related decreases in activity were similar across groups, but region-of-interest analyses showed reduced leftward lateralisation of activity in inferior frontal gyrus in DLD. These findings suggest weaker disengagement of the default mode network during a linguistically demanding task in DLD. Although repetition-based pseudoword learning recruited similar neural mechanisms in both groups, these mechanisms may operate less efficiently in DLD, alongside reduced hemispheric specialisation in inferior frontal cortex. HighlightsO_LISimilar repetition-related neural attenuation across groups during pseudoword learning. C_LIO_LIReduced default-mode network suppression during pseudoword repetition in DLD. C_LIO_LIReduced left-hemisphere specialisation of inferior frontal cortex in DLD. C_LIO_LIRepetition-based learning in DLD supported by less efficient neural networks. C_LI

12
Tinnitus: An Unrecognised Symptom of Functional Neurological Disorder

Palmer, D. D. G.; Edwards, M. J.; Mattingley, J.

2026-03-19 neurology 10.64898/2026.03.16.26348516 medRxiv
Top 0.1%
0.4%
Show abstract

Background Functional neurological disorder (FND) is a common neurological condition characterised by symptoms which vary characteristically with attention. In the sensory realm, these symptoms frequently take the form of 'phantom' perception in the absence of sensation. While the condition is generally regarded not to cause auditory symptoms, tinnitus is a phantom perception which varies with symptom-focused attention, and is suggested to have similar underlying mechanisms to those proposed for FND. Based on this, we hypothesized that tinnitus might reflect the same underlying process as FND, and that it would therefore be more common in people with FND (pwFND). Methods Using an international database, we compared the proportions of pwFND who reported tinnitus with a control group. To ensure that observed differences were not attributable to agreement bias in symptom reporting, we also conducted an experiment where pwFND and controls were asked to report which symptoms they had experienced in the past month, 14 of which were symptoms of FND, and 7 of which were unrelated. Results Rates of tinnitus were significantly higher in the FND group (54% HDI 50 - 57%, n=732) than the control group (17% HDI 8.5 - 25%, n=59). In the symptom reporting experiment, pwFND (n=38) reported more FND-related symptoms than controls (n=38), but there was no between-group difference in reporting of non-FND related symptoms. Discussion Based on the markedly higher prevalence of tinnitus in pwFND than controls, and the substantial overlap in mechanisms and phenomenology, we believe tinnitus should be considered a possible symptom of FND, where both conditions reflect a failure of symptom resolution after incitement by a peripheral stimulus.

13
Linguistic and Acoustic Biomarkers from Simulated Speech Reveal Early Cognitive Impairment Patterns in Alzheimers Disease

Debnath, A.; Sarkar, S.

2026-04-08 neuroscience 10.64898/2026.04.08.717162 medRxiv
Top 0.2%
0.3%
Show abstract

BackgroundAlzheimers disease (AD) causes progressive decline in language and cognition. Automated speech analysis has emerged as a promising screening tool, yet clinical data scarcity limits progress. To address this, we generated a large-scale simulated speech dataset to model linguistic and acoustic deterioration across cognitive stages, Control, Mild Cognitive Impairment (MCI), and AD. MethodsUsing Monte Carlo simulations, we emulated the Pitt DementiaBank "Cookie Theft" narratives. Acoustic features (speech rate, pause duration, jitter, shimmer) and linguistic features (type-token ratio, unique-word count, filler usage) were synthetically sampled from real-world DementiaBank distributions. We trained an XGBoost classifier to distinguish diagnostic groups, and applied SHAP (Shapley Additive exPlanations) to assess feature importance. ResultsThe model achieved high discriminative performance (AUC {approx} 0.94; accuracy {approx} 85%). Compared to controls, simulated MCI and AD groups showed progressive declines in fluency and lexical diversity, and increases in disfluencies and voice instability. SHAP analysis revealed that key predictors included reduced type-token ratio, higher pause and filler rates, and elevated jitter/shimmer. Classification was most accurate for Control vs. AD; MCI misclassifications highlighted intermediate profiles. InterpretationOur framework, FMN (Forget Me Not), captures clinically relevant speech changes using simulated data, offering an explainable and scalable approach for cognitive screening. While not a substitute for real datasets, FMN validates a pipeline that mirrors known AD markers and can guide future real-world deployments. External validation remains a key next step for translational impact.

14
Automated detection of adult autism from vowel acoustics using machine learning

Georgiou, G. P.; Paphiti, M.

2026-04-04 health informatics 10.64898/2026.04.03.26350102 medRxiv
Top 0.2%
0.3%
Show abstract

Autism spectrum disorder (ASD) is a neurodevelopmental condition for which timely and accurate detection remains a major clinical priority. Early and reliable identification is important because it can facilitate access to assessment, diagnosis, and appropriate support; however, current diagnostic pathways still rely largely on behavioural evaluation and clinical judgement. In this context, machine-learning (ML) approaches have attracted growing interest because they can identify subtle and complex patterns in speech data that may not be easily captured through conventional methods. The current study capitalizes on this potential by developing and evaluating ML models for distinguishing autistic individuals from neurotypical individuals based on speech features. More specifically, acoustic features of vowels, including fundamental frequency (F0), first three formants (F1, F2, F3), duration, jitter, shimmer, harmonics-to-noise ratio (HNR), and intensity, were elicited from 18 autistic adults and 18 neurotypical adults through a controlled production task. Then, four supervised ML models were trained and evaluated on these features: LightGBM, Random Forest, Support Vector Machine, and XGBoost. All models demonstrated good classification performance, with the best-performing model achieving a strong discriminability of 89%. The explainability analysis identified F0 as the most influential predictor by a substantial margin, followed by intensity, F3, and F1, while duration, shimmer, HNR, jitter, and F2 contributed more modestly. These findings demonstrate that vowel acoustics contain clinically relevant information for distinguishing autistic from neurotypical adult speech and highlight the potential of interpretable, speech-based ML as a transparent and scalable aid for ASD screening and assessment.

15
Corpus for Benchmarking Clinical Speech De-identification

Dai, H.-J.; Fang, L.-C.; Mir, T. H.; Chen, C.-T.; Feng, H.-H.; Lai, J.-R.; Hsu, H.-C.; Nandy, P.; Panchal, O.; Liao, W.-H.; Tien, Y.-Z.; Chen, P.-Z.; Lin, Y.-R.; Jonnagaddala, J.

2026-04-03 health informatics 10.64898/2026.03.31.26349906 medRxiv
Top 0.2%
0.3%
Show abstract

Objectives Publicly available datasets dedicated to clinical speech deidentification tasks remain scarce due to privacy constraints and the complexity of speech-level annotation. To address this gap, we compiled the SREDH-AICup sensitive health information (SHI) speech corpus, a time-aligned clinical speech dataset annotated across 38 SHI categories. Methods Two publicly available English medical-domain datasets were adapted to support speech-level de-identification, including script reformulation and controlled re-recorded by 25 participants. Additional Mandarin Chinese clinical-style materials were incorporated to extend linguistic coverage. All audio data were annotated with million-level, time-aligned SHI spans using Label Studio. Inter-annotator agreement was evaluated using Cohen's kappa, following iterative calibration rounds. The resulting corpus supports both automatic speech recognition (ASR) and speech-level recognition of SHIs. Results The final dataset comprises 20 hours of annotated audio, divided into training (10 hours, 1,539 files), validation (5 hours, 775 files), and test (5 hours, 710 files) subsets, totalling 7,830 SHI entities. The language distribution reflects the composition of the selected source materials, with 19.36 hours of English and 0.89 hours of Mandarin Chinese speech. Discussion The corpus exhibits a long-tail distribution consistent with clinical documentation patterns and highlights the limited availability of Chinese medical speech resources. These characteristics underscore both the realism of the dataset and structural challenges associated with multilingual speech de-identification. Conclusion The SREDH-AICup SHI speech corpus provides a clinically grounded, time-aligned speech dataset supporting automated medical speech de-identification research and facilitating future development of multilingual speech-based privacy protection systems.

16
Measurement Equivalence of the ASRS Across the Adult Lifespan: A Differential Item Functioning Analysis

Givon-Schaham, N.; Shalev, N.

2026-04-07 psychiatry and clinical psychology 10.64898/2026.04.06.26350233 medRxiv
Top 0.2%
0.3%
Show abstract

Adult ADHD is increasingly recognized across the lifespan, yet the psychometric equivalence of the Adult ADHD Self-Report Scale (ASRS) remains unverified for older populations. This study examined age-related Differential Item Functioning (DIF) in 600 adults (n = 100 per decade, ages 20-80) who completed the 18-item ASRS. Using a bi-factor Graded Response Model, we extracted latent ADHD trait scores ({omega}H = .895) and assessed DIF via ordinal logistic regression with adaptive age modeling. Five of 18 items exhibited significant uniform DIF. At equivalent latent severity, older adults were less likely to endorse hyperactivity symptoms in Part A (fidgeting, feeling "driven by a motor") but more likely to endorse specific symptoms in Part B (careless mistakes, misplacing items, interrupting). From ages 20 to 80, expected Part A scores decreased by 1.36 points (~0.27 per decade), while Part B scores increased by 1.15 points (~0.23 per decade). These findings indicate a phenotypic redistribution of ADHD symptoms as individuals age. Because the 6-item Part A screener serves as the primary clinical gatekeeper, its concentration of negative DIF suggests standard screening practice may systematically underestimate ADHD severity in older adults. We recommend using the full 18-item ASRS when screening older populations and suggest that developing age-adjusted norms would improve diagnostic accuracy.

17
Language attrition and semi-lingualism among Liberian and Sierra Leonean refugee children: A sociolinguistic-psychological dynamic of trauma and mental health in stateless refugees in Oru, Nigeria

Yarseah, D. A.; Ibimiluyi, O. F.; Awosusi, O. O.; Flomo, J. M.; Fatai, B. F.; Olaoye, E. O.; Adesola, A. F.

2026-03-20 psychiatry and clinical psychology 10.64898/2026.03.17.26348612 medRxiv
Top 0.2%
0.3%
Show abstract

ABSTRACT Background Liberian and Sierra Leonean children born during and after the 2012 UNHCR cessation clause, and the subsequent closure of the Oru refugee camp in Nigeria, have grown up in conditions of protracted displacement and de facto statelessness. Many of these children have been exposed to multiple forms of trauma, including witnessing violence as well as physical, emotional, and sexual adversities within a complex and resource-constrained environment. Many also experience cultural-linguistic disruptions, including heritage-language attrition and increased reliance on host-country languages, which may be associated with challenges in identity formation and social integration. However, little is known about how trauma exposure interacts with language-related factors to influence PTSD and complex PTSD (CPTSD)-related functional impairment among stateless refugee children. Methods Using a cross-sectional design, 320 children aged 6-17 years (180 Liberian, 140 Sierra Leonean) were assessed. Trauma exposure was measured using the Child and Adolescent Trauma Screen (CATS), and PTSD/CPTSD functional impairment using the International Trauma Questionnaire-Child and Adolescent Version (ITQ-CA). Heritage- and host-language proficiency were assessed using a structured sociolinguistic questionnaire. Multivariate covariance analyses were conducted using SPSS to examine main and interaction effects. Results Multivariate analyses revealed that poorer host-language communication was associated with higher PTSD-related functional impairment (F(3, 311) = 2.85, p = .038, partial eta-squared = .027), whereas CPTSD impairment was largely unaffected. Native-language proficiency also predicted PTSD impairment (F(3, 290) = 3.44, p = .017, partial eta-squared = .034), and children with low heritage-language skills, limited parental/home-language exposure, and no Nigerian-language use showed the highest CPTSD impairment. Emotional connection to the native language provided a modest protective effect. The combined heritage- and host-language exposure was linked to lower trauma-related functional impairment, particularly for children at higher risk of CPTSD. Witnessed trauma emerged as the strongest predictor of functional impairment among refugee children, with CPTSD outcomes showing greater sensitivity (partial eta-squared = .153) than PTSD (partial eta-squared = .076). Conclusions Heritage-language competence and bilingual proficiency were associated with reduced PTSD-related functional impairment, whereas CPTSD was more strongly shaped by cumulative relational trauma. These findings highlight the potential value of interventions that support bilingual development and heritage-language preservation as pathways to resilience among stateless refugee children. Keywords: Language attrition; bilingual competence; trauma exposure; refugee children; CPTSD; PTSD; functional impairment

18
A Blinded Comparative Evaluation of Clinical and AI-Generated Responses to Otologic Patient Queries

Akinniyi, S.; Jain-Poster, K.; Evangelista, E.; Yoshikawa, N.; Rivero, A.

2026-04-15 otolaryngology 10.64898/2026.04.14.26350677 medRxiv
Top 0.2%
0.2%
Show abstract

ObjectiveThe objective of this study is to assess the quality, empathy, and readability of large language model (LLM) responses regarding otologic questions from patients as they compare to verified physician responses in other patient-driven forums. This study aims to predict the potential utility of LLMs in patient-centered communication. Study DesignComparative study SettingsInternet MethodsA sample of 49 otology-related questions posted on Reddit r/AskDocs1 between January 2020 and June 2025 were selected using search terms including "hearing loss," "ear infection," "tinnitus," "ear pain," and "vertigo." Posts were retrieved using Reddits "Top" filter. Each question was answered by a verified doctor on Reddit and three AI LLMs (ChatGPT-4o, ClaudeAI, Google Gemini). Responses were scored by five evaluators. ResultsCommon otologic concerns posed in patient questions were otalgia (38.7%), vertigo (28.6%), tinnitus (24.5%), hearing loss (22.4%), and aural fullness (20.4%). LLM responses were longer than physician responses (mean 145 vs 67 words; p < .05) and rated higher in quality (10.95 vs 9.58), empathy (7.26 vs 5.18), and readability (4.00 vs 3.73); (all p < .05). Evaluators correctly identified AI versus physician responses in 89.4% of cases with higher sensitivity for detecting physician responses (93.5%). By Flesch-Kincaid grade level, ChatGPT produced the most readable content (mean 7.25), while ClaudeAI responses were more complex (11.86; p < .05). ConclusionLLM responses received higher ratings in quality, empathy, and readability than those of physicians in response to a variety of otologic concerns. When appropriately implemented, such systems may enhance access to understandable otologic information and complement clinician-delivered care.

19
Feasibility study on a Noninvasive Assessment of ALS Patient Emotional State

Garbey, M.; Lesport, Q.; Oztosun, G.; Heidebrecht, M.; Pirouz, K.; Bayat, E.

2026-03-24 neurology 10.64898/2026.03.18.26348710 medRxiv
Top 0.2%
0.2%
Show abstract

This study addresses the need for objective, real-time assessment of emotional responsiveness and coping strategies in individuals with Amyotrophic Lateral Sclerosis (ALS) to support personalized care. We are using non-invasive speech analysis and data science methods on an expanded cohort comprising 28 ALS patient visits. We first demonstrate that commonly available artificial intelligence tools, including current-generation large language models (LLMs), such as ChatGPT, Gemini and Claude, do not provide reliable or reproducible assessments of patient concern levels in the absence of expert clinical supervision. Further, we observe a discrepancy between subjective and objective metrics such as the forced vital capacity for breathing. We introduce a novel functional classification system that contextualizes clinician-rated emotional concern relative to the patient's functional impairment as measured by the ALS Functional Rating Scale (ALS-FRS). Patient responses are categorized as: Congruent: Emotional responsiveness is proportional to functional impairment. Muted: Emotional response is lower than expected given functional impairment. Excessive: Emotional response exceeds that expected given functional impairment

20
Sensorimotor mapping of volitional facial movements in Tourette Syndrome

Smith, C. M.; Houlgreave, M. S.; Asghar, M.; Francis, S. T.; Jackson, S. R.

2026-04-04 neuroscience 10.64898/2026.04.02.712172 medRxiv
Top 0.2%
0.2%
Show abstract

BackgroundTourette Syndrome (TS) is a neurodevelopmental movement disorder involving involuntary motor and vocal tics believed to be characterised by disordered neural inhibition. Cortical representations have previously been manipulated by disruptions in the inhibitory neurotransmitter {gamma}-aminobutyric acid (GABA). However, while facial tics are the most reported motor tic, it is unclear if facial sensorimotor representations differ in TS. MethodsSixteen individuals with Tourette Syndrome (TS) or chronic tic disorder and twenty typically developing (TD) control participants underwent 3-Tesla functional magnetic resonance imaging (fMRI). Blood-oxygenation level-dependent (BOLD) responses were measured during a block-design task comprising cued facial movements of common facial tics (blinking, grimacing and jaw clenching). Activations in bilateral pre- and post-central cortices and supplementary motor areas (SMA) were examined. Conjunction analyses identified voxels commonly and uniquely activated across movements within each group. ResultsBoth groups showed significant activations in the bilateral sensorimotor cortices and SMA in response to blink, grimace and jaw clench movements, with no significant between-group differences. Between-group similarities were lowest for unique blink maps. Common voxel maps also revealed low between-group similarity, with reduced sensorimotor activation and no shared SMA activation across movements in the TS group. ConclusionVoluntary facial sensorimotor representations do not differ between groups. However, low similarities between group unique blink maps may reflect greater prevalence of blinking tics in TS. Additionally, reduced overlap in sensorimotor activation and absent common SMA engagement across cued movements in the TS group may indicate altered motor integration or action initiation.